Search CORE

45 research outputs found

Towards Higher Quality Internal and Outside Multilingualization of Web Sites

Author: Bellynck Valérie
Boitet Christian
Mangeot Mathieu
Ramisch Carlos
Publication venue: Centre for Indian Language Technology (CFILT), IITB (Indian Institute of Technology Bombay), x-proceedings = yes, x-international-audience = yes, x-invited-conference = yes
Publication date: 16/07/2008
Field of study

International audienceThe multilingualization of Web sites with high quality is increasingly important, but is unsolvable in most situations where internal quality certification is needed, and not solved in the majority of other situations. We demonstrate it by analyzing a variety of techniques to make the underlying software easily localizable and to manage the translation of textual content in the classical internal mode, that is by modifying the language-dependent resources. A new idea is that volunteer final users should be able to contribute to the improvement oreven production of translated resources and content. For this, we have developed a PHP piece of code which naive webmasters (not computer scientists nor professional translators) can add to a Web site to enable internal multilingualization by users with enough access rights: in management mode, these users can edit the texts of titles, button labels, messages, etc. in text areas appearing in context in the Web page. If Web site developers follow some recommendations, all textual interface elements should be localizable in this way. Another angle of attack, applicable in all cases where navigating a site though a gateway is possible, consists in replacing the problem of diffusion by the problem of access in multiple lang uages. We introduce the concept of iMAG (interactive Multilingual Access Gateway, dedicated to a Web site or domain) to solve the problem of higher quality multilingual access. First, by using available MT systems or by default morphological processors and bilingual dictionaries, any page of an elected website is made instantly accessible in many languages, with a generally low quality profile, as through usual translation gateways. Over time, the quality profile of textual GUI elements, Web pages and even documents (if accessible in html) will improve thanks to outside contributors, who will post-edit or produce the translations from the reading context. This is only possible because the iMAG associated to the website stores the translations in its translation memory (TM) and the contributed dictionary items it its dictionary. The TM has quality levels, according to the users' profiles, and scores within levels. An API will be proposed so that the developers of the elected website can connect their to its iMAG, retrieve the best level translations, certify them if necessary, and put them in their localized resources. At that point, external localization meets internal localization

Hal - Université Grenoble Alpes

Operationalization of interactive Multilingual Access Gateways (iMAGs) in the Traouiero project

Author: Bellynck Valérie
Boitet Christian
Falaise Achille
Nguyen Hong-Thai
Publication venue: HAL CCSD
Publication date: 17/11/2011
Field of study

International audienceWe will explain and demonstrate iMAGs (interactive Multilingual Access Gateways), in particular on a scientific laboratory web site and on the Greater Grenoble (La Métro) web site. This bilingual presentation has been obtained using an iMAG. Presentation This presentation is an adaptation and update of an article presented as a demonstration only to TALN-2010. The names of the files have been kept the same, although their contents are slightly different. The iMAG concept has been proposed by Ch. Boitet and V. Bellynck in 2006 (Boitet & al. 2008, Boitet & al. 2005), and reached prototype status in November 2008, with a first demonstration on the LIG laboratory Web site. It has been adapted to the DSR (Digital Silk Road) Web site in April 2009, and then to more than 50 other Web sites. These first prototypes are extensions of the SECTra_w (Huynh & al. 2008) online translation corpora support system. Since the beginning of 2011, we are operationalizing this software with a view to deploy it as a multilingual access infrastructure, in the context of the French ANR (National Agency for Research) Traouiero " emergence " project. An iMAG is an interactive Multilingual Access Gateway very much like Google Translate at first sight: one gives it a URL (starting Web site) and an access language and then navigates in that access language. When the cursor hovers over a segment (usually a sentence or a title), a palette shows the source segment and proposes to contribute by correcting the target segment, in effect post-editing an MT result. With Google Translate, the page does not change after contribution, and if another page contains the same segment, its translation is still the rough MT result, not the polished post-edited version. The more recent Google Translation Toolkit enables one to MT-translate and then post-edit online full Web pages from sites such as Wikipedia, but again the corrected segments don't appear when one later browses the Wikipedia page in the access language. By contrast, an iMAG is dedicated to an elected Web site, or rather to the elected sublanguage defined by one or more URLs and their textual content. It contains a translation memory (TM) and a specific, preterminological dictionary (pTD), both dedicated to the elected sublanguage. Segments are pretranslated not by a unique MT system, but by a (selectable) set of MT systems. Systran and Google are mainly used now, but specialized systems developed from the postedited part of the TM, and based on Moses, will be also used in the future. The powerful online contributive platforms SECTra_w and PIVAX (Nguyen & al. 2007) are used to support the TMs and pTDs. Translated pages are built with the best segment translations available so far. While reading a translated page, it is possible not only to contribute to the segment under the cursor, but also to seamlessly switch to SECTra_w online post-editing environment, equipped with proactive dictionary help and good filtering and search-and-replace functions, and then back to the reading context. A translation relay is being implemented to define the iMAGs or other translation gateways used by an elected Web site, select and parameterize the MT systems and translation routes used for various language pairs, and manage users, groups, projects (some contributions may be organized, other opportunistic), and access rights. Finally, MT systems tailored to the selected sublanguage can be built (by combinations of empirical and expert methods) from the TM and the pTD dedicated to a given elected Web site. That approach will inherently raise the linguistic and terminological quality of the MT results, hopefully converting them from rough into raw translations. The demonstration will use some iMAGs created by the AXiMAG startup for various Web sites, such as those of the LIG lab (http://service.aximag.fr:8180/xwiki/bin/view/imag/liglab) and of La Metro (Greater Grenoble) web site (http://service.aximag.fr:8180/xwiki/bin/view/imag/lametro), where access in Chinese and English was enabled in 2010 for the Shanghai Expo

Hal - Université Grenoble Alpes

Multilinguisation d'ontologies dans le cadre de la recherche d'information translingue dans des collections d'images accompagnées de textes spontanés

Author: BELLYNCK Valérie
BOITET Christian
ROUQUET David
Publication venue
Publication date: 01/01/2012
Field of study

Le Web est une source proliférante d'objets multimédia, décrits dans différentes langues natu- relles. Afin d'utiliser les techniques du Web sémantique pour la recherche de tels objets (images, vidéos, etc.), nous proposons une méthode d'extraction de contenu dans des collections de textes multilingues, paramétrée par une ou plusieurs ontologies. Le processus d'extraction est utilisé pour indexer les objets multimédia à partir de leur contenu textuel, ainsi que pour construire des requêtes formelles à partir d'énoncés spontanés. Il est basé sur une annotation interlingue des textes, conservant les ambiguïtés de segmentation et la polysémie dans des graphes. Cette première étape permet l'utilisation de processus de désambiguïsation factorisés au niveau d'un lexique pivot (de lexèmes interlingues). Le passage d'une ontologie en paramètre du système se fait en l'alignant de façon automatique avec le lexique interlingue. Il est ainsi possible d'utiliser des ontologies qui n'ont pas été conçues pour une utilisation multilingue, et aussi d'ajouter ou d'étendre l'ensemble des langues et leurs couvertures lexicales sans modifier les ontologies. Un démonstrateur pour la recherche multilingue d'images, développé pour le projet ANR OMNIA, a permis de concrétiser les approches proposées. Le passage à l'échelle et la qualité des annotations produites ont ainsi pu être évalués.The World Wide Web is a proliferating source of multimedia objects described using various natural languages. In order to use semantic Web techniques for retrieval of such objects (images, videos, etc.), we propose a content extraction method in multilingual text collections, using one or several ontologies as parameters. The content extraction process is used on the one hand to index multimedia objects using their textual content, and on the other to build formal requests from spontaneous user requests. The process is based on an interlingual annotation of texts, keeping ambiguities (polysemy and segmentation) in graphs. This first step allows using common desambiguation processes at th elevel of a pivot langage (interlingual lexemes). Passing an ontology as a parameter of the system is done by aligning automatically its elements with the interlingual lexemes of the pivot language. It is thus possible to use ontologies that have not been built for a specific use in a multilingual context, and to extend the set of languages and their lexical coverages without modifying the ontologies. A demonstration software for multilingual image retrieval has been built with the proposed approach in the framework of the OMNIA ANR project, allowing to implement the proposed approaches. It has thus been possible to evaluate the scalability and quality of annotations produiced during the retrieval process.SAVOIE-SCD - Bib.électronique (730659901) / SudocGRENOBLE1/INP-Bib.électronique (384210012) / SudocGRENOBLE2/3-Bib.électronique (384219901) / SudocSudocFranceF

OpenGrey Repository

Exploitation d'une base lexicale dans le cadre de la conception de l'ENPA Innovalangues

Author: Bellynck Valérie
Eggers Emmanuelle
Goudin Yoann
Loiseau Mathieu
Mangeot Mathieu
Publication venue: HAL CCSD
Publication date: 04/07/2016
Field of study

National audienceOperating a lexical database in the framework of the Innovalangues language learning platform The Innovalangues project aims to design a personalized digital environment for language learning. In this context, several modules such as serious games or exercise generators need a lexical database. It is also interesting to provide solutions for learners to manage their own lexicons. To prevent each one to develop its own lexical database, it quickly appears essential to develop one common lexical database that can be used for modules (machines) and learners (humans). After an analysis around usage scenarios, we set up a multilingual lexical database architecture. We then built a working prototype that integrates with the environment and allows a learner to look up existing lexical resources and create its own lexicon. The LexInnova prototype uses the Jibiki platform for managing heterogeneous lexical resources via its REST application programming interface.Le projet Innovalangues a pour but la conception d'un environnement numérique personnalisé d'apprentissage des langues (ENPA). Dans ce cadre, plusieurs modules tels des jeux sérieux ou des générateurs d'exercices ont besoin d'une base lexicale. Il est intéressant également de donner la possibilité aux apprenants de gérer leurs propres lexiques. Pour éviter que chacun ne développe sa propre base lexicale, il apparaît rapidement indispensable de développer une seule base lexicale commune qui puisse servir aux modules (machines) comme aux apprenants (humains). Après une analyse des besoins autour de scénarios d'utilisation, nous proposons une architecture de base lexicale multilingue. Nous avons ensuite réalisé un prototype fonctionnel qui s'intègre à l'ENPA et permet à un apprenant de consulter des ressources lexicales existantes et de créer son propre lexique. Le prototype LexInnova utilise la plate-forme Jibiki de gestion de ressources lexicales hétérogènes à distance via son interface de programmation (API) REST

Hal - Université Grenoble Alpes

MT on and for the Web

Author: Christian Boitet
Getalp
Hervé Blanchon
Mark Seligman
Valérie Bellynck
Publication venue
Publication date: 11/04/2020
Field of study

Abstract A Systran MT server became available on the minitel network in 1984, and on Internet in 1994. Since then we have come to a better understanding of the nature of MT systems by separately analyzing their linguistic, computational, and operational architectures. Also, thanks to the CxAxQ metatheorem, the systems' inherent limits have been clarified, and design choices can now be made in an informed manner according to the translation situations. MT evaluation has also matured: tools based on reference translations are useful for measuring progress; those based on subjective judgments for estimating future usage quality; and task-related objective measures (such as post-editing distances) for measuring operational quality. Moreover, the same technological advances that have led to "Web 2.0" have brought several futuristic predictions to fruition. Free Web MT services have democratized assimilation MT beyond belief. Speech translation research has given rise to usable systems for restricted tasks running on PDAs or on mobile phones connected to servers. New man-machine interface techniques have made interactive disambiguation usable in large-coverage multimodal MT. Increases in computing power have made statistical methods workable, and have led to the possibility of building low-linguisticquality but still useful MT systems by machine learning from aligned bilingual corpora (SMT, EBMT). In parallel, progress has been made in developing interlingua-based MT systems, using hybrid methods. Unfortunately, many misconceptions about MT have spread among the public, and even among MT researchers, because of ignorance of the past and present of MT R&D. A compensating factor is the willingness of end users to freely contribute to building essential parts of the linguistic knowledge needed to construct MT systems, whether corpus-related or lexical. Finally, some developments we anticipated fifteen years ago have not yet materialized, such as online writing tools equipped with interactive disambiguation, and as a corollary the possibility of transforming source documents into self-explaining documents (SEDs) and of producing corresponding SEDs fully automatically in several target languages. These visions should now be realized, thanks to the evolution of Web programming and multilingual NLP techniques, leading towards a true Semantic Web, "Web 3.0", which will support ubilingual (ubiquitous multilingual) computing

CiteSeerX

Introduction d'une vue textuelle synchronisée avec la vue géométrique primaire dans Cabri-II

Author: Bellynck Valérie
Publication venue: HAL CCSD
Publication date: 29/10/1999
Field of study

Cabri-géomètre est un logiciel qui permet l'exploration de figures géométriques par manipulation directe des objets géométriques qui les constituent. Ce logiciel plonge l'utilisateur dans un micromonde intelligent et constitue ainsi un environnement d'apprentissage pour la géométrie. Les utilisateurs peuvent construire des figures géométriques, explorer le champ des animations et déformations de la construction, élaborer de nouveaux outils avec des macro-constructions, et spécialiser leur environnement pour des tâches spécifiques en y intégrant éventuellement leurs outils personnels. Le logiciel offre des possibilités de programmation par démonstration, mais les utilisateurs ont souvent besoin de manipuler la structure logique du programme construit pour le mettre au point et le maîtriser. Le choix d'une forme particulière pour présenter ce programme tient compte des spécificités du domaine de la géométrie dynamique et de la diversité des utilisateurs. Dans notre travail de prototypage, nous avons spécifié et implémenté un support textuel, mais laissé ouverte la possibilité de le compléter par un graphe. Le profil des utilisateurs a été pris en compte pour définir la forme de ce texte : en effet, la formalisation d'un langage de programmation sous-jacent aux constructions visuelles directes ne doit pas constituer une contrainte, et la familiarisation avec ce langage (moyen de communication entre l'utilisateur et le logiciel) doit se faire de façon inconsciente. Ces exigences ont abouti à l'intégration dans Cabri-II d'une vue textuelle des figures, équivalente à la vue graphique, dynamique autant que la figure (dans ce sens que le programme se construit en même temps que la figure), et où l'ubiquité des objets dans les vues synchrones permet un apprentissage implicite du langage de Cabri-programmation. La " qualité dynamique " de la géométrie dans la figure est traduite par la " qualité formelle " du langage induit, et les manipulations de l'interface sont transcrites en des animations du texte. La démarche consistant à partir d'une programmation visuelle pour l'expliciter en une programmation textuelle est nouvelle, pose des problèmes spécifiques intéressants, et pourrait assez rapidement être complétée, puis être appliquée avec profit à d'autres environnements analogues

Hal - Université Grenoble Alpes

Introduction d’un support textuel pour expliciter la programmation dans Cabri-II

Author: Bellynck Valérie
Publication venue: 'PERSEE Program'
Publication date: 01/01/2001
Field of study

The users of software of dynamic geometry face problems linked with the computational nature of the activity of building diagrams (geometric constructions). The direct manipulation supported by the software to handle the objects makes it possible for the users to work directly on the result of the execution of their program, but this manipulation of the results alone is not sufficient to develop a construction as soon as the behavior obtained is not the desired one. In order to give feedback and clarify the programming activity underlying the use of the Cabri-géomètre software, while remaining adapted to the intended public of non-programmers, we have introduced a language of description of the figure which reports its underlying structure, while remaining close to the actions of the user. We have associated to this "narrative" language a textual view which constitutes a support of explicit handling of the structures of construction and thus of the implicit data-processing concepts of the programming activity in Cabri-géomètre.Les utilisateurs de logiciels de géométrie dynamique sont confrontés à des problèmes liés à la nature programmatoire de l’activité de construction des figures. La manipulation directe des objets permet aux utilisateurs de travailler directement sur le résultat de l’exécution de leur programme, mais cette seule manipulation des résultats n’est pas suffisante pour mettre au point une construction dès que le comportement obtenu n ’est pas le comportement désiré. Nous avons introduit un langage de description de la figure qui révèle sa structure profonde, ce qui rend explicite la programmation sous-jacente à l’ utilisation du logiciel Cabri-géomètre. Ce langage est adapté au public non informaticien des utilisateurs de ce logiciel et est proche des actions effectuées par l’ utilisateur. Nous avons associé à ce langage «narratif » une vue textuelle qui constitue un support de manipulation explicite des structures des constructions et donc des concepts informatiques implicites de la programmation dans Cabri-géomètre.Bellynck Valérie. Introduction d’un support textuel pour expliciter la programmation dans Cabri-II. In: Sciences et techniques éducatives, volume 8 n°3-4, 2001. Interaction homme-machine pour la formation et l'apprentissage humain, sous la direction de Elisabeth Delozanne et Pierre Jacobini. pp. 347-377

Introduction d'une vue textuelle synchronisée avec la vue géométrique primaire dans Cabri-II

Author: Bellynck Valérie
Publication venue: HAL CCSD
Publication date: 29/10/1999
Field of study

Thèses en Ligne

Hal - Université Grenoble Alpes

Valérie Bellynck, Introduction d'une vue textuelle synchronisée avec la vue géométrique primaire dans Cabri-II. Thèse de l'université de Grenoble en informatique, 29 octobre 1999

Author: Bellynck Valérie
Publication venue: PERSÉE : Université de Lyon, CNRS & ENS de Lyon
Publication date: 01/01/1999
Field of study

Bellynck Valérie. Valérie Bellynck, Introduction d'une vue textuelle synchronisée avec la vue géométrique primaire dans Cabri-II. Thèse de l'université de Grenoble en informatique, 29 octobre 1999. In: Sciences et techniques éducatives, volume 6 n°2, 1999. p. 451

Un devin de microstructures pour importer ou normaliser des ressources lexicales

Author: Bellynck Valérie
Mangeot Mathieu
Publication venue: HAL CCSD
Publication date: 27/09/2018
Field of study

International audienceIn this article, we present a tool to annotate an unstructured or semi-structured resource, in order to automate the import of its data in a generic environment that facilitates and standardizes its use. This tool is based on the computation of a signature, characterizing the resource with accounting information about its elements, and on heuristics using signature statistics to identify the entries and microstructure of each entry in the case of a lexical resource to be imported into the Jibiki lexical database platform. The product result is managed in iPolex, a lexical data warehouse dedicated to processing lexical resource files to add metadata. The Jibiki platform accesses iPoLex to directly import lexical resources. The import process thus constituted is used to create and populate lexical bases in Jibiki. The imported resources are then available for consultation and editing onlineDans cet article, nous présentation un outil pour annoter une ressource non structurée ou semi-structurée, de façon à automatiser l'import de ses données dans un environnement générique facilitant et uniformisant son exploitation. Cet outil s'appuie sur le calcul d'une signature, caractérisant la ressource par des informations comptables sur ses éléments, et sur des heuristiques exploitant les statistiques de la signature pour repérer les entrées et la microstructure de chaque entrée dans le cas d'une ressource lexicale à importer dans la plateforme de bases lexicales Jibiki. Le résultat produit est géré dans iPolex, un entrepôt dédié aux traitements de fichiers de ressources lexicales permettant d'y ajouter des méta-données. La plateforme Jibiki accède à iPoLex pour importer directement des ressources lexicales bien formées et suffisamment renseignées. Le processus d'import ainsi constitué est utilisé pour créer et peupler des bases lexicales dans Jibiki. Les ressources ainsi importées sont ensuite consultables et éditables en ligne

Hal - Université Grenoble Alpes

HAL Université de Savoie